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Abstract 

In this paper, we present a general review of hash functions in a cryptographic 
sense. We give special emphasis on some particular topics such as cipher 
block chaining message authentication code (CBC MAC) and its variants. 
This paper also broadens the information given in [1], by including more 
details on block-cipher based hash functions and security of different hash 
schemes. 
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1 Introduction 



In dictionary, hash is defined as 'to chop into small pieces' or 'to muddle, mess 
up', [2]. The mathematics, particularly cryptography implicitly combines 
these two meanings of hashing to define the term, hash function. The reason 
is that, in cryptography a hash function is the name of a process of taking an 
input, a message, and messing it up by an algorithm, and finally producing 
a smaller output, message digest (hash value), compared to the input. Hash 
functions are constructed with the design intent that a hash value should be 
like a fingerprint of the message. For a hash value, to be like a fingerprint 
means that two randomly chosen messages would have the same hash value 
with sufficiently small probability. In other words, hash values should be 
compressed representatives of the messages they correspond. 

These properties of hash functions provide that they can be used for 
data integrity and message authentication. A hash function can be used for 
data integrity as follows. Suppose that there is a sender (S) who will send 
a message M to a receiver (R). S first computes the hash value h(M) of 
the message in question with a hash function h. Then sends M together 
with h(M) to R. When R gets these, he recomputes the hash value of the 
possibly modifed message M and compares it with the original hash value 
h(M). If these two are equal, then R believes that M is not changed. 1 If S 
and R also want the authentication of the message sent in this transmission, 
S can compute his signature by using an encryption algorithm Ek(-) with 
key K. Here, S encrypts the hash value h(M) as C = E K (H(M)) and sends 
M together with C instead of h(M) to R. When R gets these, by using a 
verification algorithm V K (.), he does or does not believe that M was sent by 
R depending on the result of verification. 



Here, we assume that h(M) is not affected during transmission. 
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2 A Formal Perspective 



It is now time to explore hash functions more formally. 

2.1 Fundamental Definitions and Results 

A very general mathematical definition of a hash function is the following. 
Definition 2.1.1 A hash function is a function h : {0, 1}* — > {0, 1}™ for a 
fixed positive integer n and with the property that h(x) is easy to compute 
for all x G {0, 1}* for any person. 

Although the above definition refers to the unkeyed hash functions, there 
are also keyed hash functions where a hash value is determined by two inputs, 
a message and a secret key. In [1], a keyed hash function is defined as follows. 
Definition 2.1.2 A keyed hash function is a function h : {0, 1} K x 
{0, 1}* — > {0, 1}™ for fixed positive integers n and k if it satisfies the fol- 
lowing properties: 

1. h is public and h(k,x) is easy to compute for all x G {0,1}* and 
k G {0, 1} K . 

2. Without knowing k, it is hard to find x when h(k, x) is given, it is also 
hard to find two messages x and x' with h(k,x) = h(k,x f ). 

3. Given zero or more (x, h(k, x)) pairs it is hard to find k. This property 
is called as key non-recovery. 

4. Without knowing k, it is hard to compute h(k, x) for any x even there 
is a large set of known (xj, h(k, Xi)) pairs, of course for x ^ X{. 

There are several approaches of classifying hash functions. For exam- 
ple, one grouping method is to divide these functions such as keyed and 
unkeyed hash functions as told above. A second approach is to divide them 
as block-cipher-based and non-block-cipher-based hash functions. Another 
classification is to examine these functions based on the specific requirements 
they have. Whatever the classification is, hash functions depending on the 
specific application they are used for, should have some significant properties 
for security reasons. Some of these properties are as follows. 
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1. Preimage Resistance: Given y G {0, 1}™, it is hard to find any 
x e {0,1}* 3 h(x) =y. 

2. 2nd-Preimage Resistance: Given x, it is hard to find any x' ^ x 3 
h(x) = h(x'). 

3. Collision Resistance: It is hard to find any x and x' ^ x 3 h(x) = 



The most important class of unkeyed hash functions are the manipulation 
detection codes (MDCs) which are used for data integrity. We can divide 
MDCs into two groups one of which is the one way hash functions (OWHFs) 
and the other one is the collision resistant hash functions (CRHFs), both are 
defined as follows, [3] . 

Definition 2.1.3 A one way hash function is a hash function with prop- 
erties preimage resistance and second preimage resistance. 
Definition 2.1.4 A collision resistant hash function is a hash function 
with properties 2nd-preimage resistance and collision resistance. 

Some sources may define one way hash function without the property 
2nd-preimage resistance and may divide the definition of a collision resistant 
hash function into two pieces one of which is the weakly collision resistant 
hash function (WCFHF) that has only 2nd-preimage resistance and the other 
one is the strongly collision resistant hash function (SCFHF) that is defined 
only to have collision resistance, [4] . 

Corollary 2.1.5 A strongly collision free hash function is also a weakly 
collision free hash function. 

Even collision resistance implies 2nd-preimage resistance as above corol- 
lary states, we do not have an implication that collision resistance implies 
preimage resistance. The following example illustrates this. 
Example 2.1.6 Let f be a CRHF, and define a new function h as 



For the above example h is just the identity map for \x\ = mn, so has 
no collisions for this case. For \x\ ^ mn case, h is equal to f(x), so finding 



h(x') 





x\ = mn, m is a nonnegative integer. 
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a collision for h is as hard as finding a collision for f(x). Therefore, h has 
collision resistance property. Now take any x 3 \x\ — inn, then preimage of 
x is trivially x itself. Hence h has not the property of preimage resistance. 

Even though collision resistance does not imply preimage resistance, with 
a weak assumption on the relative cardinalities of the domain and range of the 
collision resistant hash function, one can prove that collision resistance does 
now imply preimage resistance. [4] proves this with the following theorem. 
Theorem 2.1.7 Suppose h : X — > Z is a hash function where \X\ and \Z\ 
are finite and \X\ > 2\Z\. Suppose A is an inversion algorithm for h. Then 
there exists a propbabilistic Las Vegas algorithm which finds a collision for 
h with probability at least 1/2. 

Even though the above theorem is for domains of finite size, the argument 
is also valid for the infinite domain {0, 1}*. 

For keyed hash functions, the most important group is the message au- 
thentication codes (MACs) which are used for message authentication. The 
formal definition of a MAC is as follows, [3]. 

Definition 2.1.8 A message authentication code algorithm is a family 
of functions h k : {0, 1}* — > {0, l} n parameterized by a secret key k G {0, 1} K , 
with the following properties: 

1. For any k and x, hk{x) is easy to compute. 

2. Given zero or more (xj, hk(xi)) pairs it is hard to compute any (x, hk{x)) 
pair for x ^ X{. This property is called as computation resistance. 
Corollary 2.1.9 A MAC algorithm is preimage resistant, 2nd-preimage re- 
sistant, and collision resistant for people not knowing the key. 
Corollary 2.1.10 Computation resistance implies key non-recovery. 

2.2 Basic Security Considerations 

The reason that one needs preimage, 2nd-preimage, or collision resistance 
for a hash function is related to ensure the security of the application pro- 
cessed. For instance suppose that we have a data integrity and message 
authentication scheme as described in the intoruction part of this paper. 
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Now suppose that there is an adversary A on the communication link of 
S and R. If A computes a signature on a message digest z and finds an 
x 3 z = h(x), then (x,y = Ek{z = h(x))) becomes a valid forgery. Thus 
a secure hash function should be preimage resistant. Now suppose that 
S sends (x,y = Ex(h(x))) to R. During this transmission if A finds an 
x' 3 h(x') = h(x), then (x',y = E K {h{x))) is a valid forgery. Hence, a 
good hash function should be 2nd-preimage resistant. Finally suppose that 
A finds two distinct x,x' 3 h(x) = h(x') and persuades S to sign h(x), then 
(x 1 , y = E K (h(x))) is again a valid forgery. Therefore, a secure hash function 
should also be collision resistant. 

For a MAC, a forgery means to break the computation resistance property 
in some way. By Corollary 2.1.10, this can be done by recovering the secret 
key. However, key recovery is not a necessary condition for a MAC forgery 
though it is a sufficient condition. Sometimes, such a forgery can be done by 
using only zero or more (x, hk(x)) pairs open to adversary A. Some sources 
classify attacks on a MAC algorithm with respect to the A's ability to control 
known (x,hk(x)) pairs, as [3] does as follows. 

1. Known- Text Attack: One or more (xi,h k (xi)) pairs are already 
available to A. 

2. Chosen- Text Attack: One or more (x i: h k (xi)) pairs are available to 
A, where chosen by A independently. 

3. Adaptive Chosen- Text Attack: One or more (xi,hk(xi)) are pairs 
available to A, where x^s can be chosen by A successively, based on the 
results of prior queries. 

When an A forges a hash function, there are two possibilites related to his 
control on the fake message he constructed. If he has partial or full control 
over the fake message, he is said to make a selective forgery. On the other 
hand, if he is only able to contruct a fake message but has no control on the 
fake message, he is said to make an existential forgery. 
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3 Block-Cipher Based Hash Functions 



As we stated in Section 2, we can divide hash functions as block-cipher and 
non-block-cipher based hash functions. In this section, we will explore the 
block-cipher based hash functions, by first introducing what a block cipher 
is. [5] defines block ciphers mathematically as follows. 

3.1 Block- Ciphers 

Definition 3.1.1 Let k and I be two strictly positive integers. A finite 
pseudorandom permutation (PRP) or a block-cipher, with key length 
k and block length / is a function F : {0, 1} K x {0, 1}' — > {0, 1}' where F(K, .) 
is a permutation for each K G {0, 1} K . 

The following definiton implicitly gives the notion of the security of a 
block cipher. 

Definition 3.1.2 Let D be a PRP adversary and let F be a PRP with block 
length / and key length k. Define the advantage of D as follows: 

Adv^ RP (F) = Pr[K <- {0, 1} K : D FM = 1] - Pr[vr <- Perrm : D*U = 1], 

where Permi is the set of all permutations from {0, 1}' to {0, 1}'. 

The meaning of the above definition is the following. D makes some 
queries for some amount of time and outputs a 'V bit indicating that he 
believes he is given a PRP. The two probabilities are according to whether 
the adversary is given a PRP or a random permutation. 

Now, how secure a block cipher is defined as follows. 
Definition 3.1.3 Let k and I be two strictly positive integers. Let F be 
a block cipher. We say adversary D can t, q, e— distinguish F from a 
random permutation if D runs at most t steps, makes at most q queries 
and Adv% RP {F) > e. 

Example 3.1.4 Here is an example of an insecure block cipher. Let k — I — 
64 and X be a PRP defined as Xk(M) — M K. Now suppose that we 
are able to make two queries. So, we first query as X^(0 64 ) = 64 K = K, 
hence we got the key if we are given the PRP. Now we make the query 
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Xft-(1 64 )- If we have l 64 K, we output a '1' bit that shows we believe that we 
were given the PRP. For a random permutation to have the property 7r(l 64 ) = 
tt(0 64 ) 1 has the probability l/(2 64 - 1). Hence we compute Adv% RP (X) = 
1 — l/(2 64 — 1). Hence, we conclude that a secure block cipher should have 
small e even we have big q and t in the sense of above definitions. 

3.2 Types of Block-Cipher Based Hash Functions 

In this section we will examine some examples of block-cipher based hash 
algorithms. In general hash algorithms have an iterative nature. Therefore, 
it is useful to make the following definition ([3]), to classify hash functions 
based on the amount of their block-cipher operations. 

Definition 3.2.1 Let h be an iterated hash function constructed from a 
block cipher E which performs s block encryptions to process each message 
block. Then the rate of h is 1/s. 

Now we will give some examples of block-cipher based hash functions and 
indicate to which attacks they are open, though we will later in this paper 
examine the well-known attack methods. 

1. Rabin's Hash 

This hash scheme is due to [6]. First the message is divided into t blocks 
whose lengths are equal to the block length of the PRP E. Then the following, 
rate-1, algorithm gives us a hash value. 

H = /^(Initializing Value), Hi = £ , (M i , H^i), H(M) = H t , where % e {1, 

The above hash method is insecure and open to birthday attack for small 
sized hash values as [7] shows. It is also open to meet-in-the-middle attack, 
[8]. 

2. Combined Plaintext-Ciphertext Chaining Hash 

This rate-1 hash scheme was offered by [9], which uses one common secret 
key for privacy and authentication. The algorithm is as follows. 

M = M t ...M u M t+l = IV, ^ = E(K, M t M t _, H^), 
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H(M) = H t+1 , where % e {1, t}. 

This algorithm is open to birthday attack, [8] . 

3. Key Chaining Hash 

This rate-1 hash function was constructed by [10] and [11] to strengthen 
Rabin's hash with the following algorithm. 

H = IV, ^ = E(Mi Hi_ u H^), H(M) = H t , where ie{l,...,t}. 

This algorithm is open to meet-in-the-middle attack. Even though there have 
been made many improvements on this scheme, as [12] and [13], it has still 
some weaknesses. 

4. Matyas-Meyer-Oseas Hash and Davies-Meyer Hash 

These hash algorithms successfully resist to meet-in-the-middle attack be- 
cause of the one-wayness of the underlying PRP. However, they have still 
some weaknesses related to the key used, [14], [15]. The algorithms 2 are as 
follows for Matyas-Meyer-Oseas and Davies-Meyer respectively, [3], [8]. 

H = IV, H t = Egp^Mi) Mi, H(M) = H t , where % e {1, .., t}. 

H = IV, Hi = E x .(H i _ 1 )^H i _ l ,H(M) = H t , where % G {1, ...,*}■ 

5. Miyaguchi-Preneel Hash 

This rate-1 hash was proposed by [16]. [17] showed that this algorithm is 
open to differential crypt analysis. The algorithm is defined as follows, [3]. 

H = IV, Hi = E g{Hi _ l} (Mi) Hi_, Mi, H(M) = H t , where % e {1, ...,*}■ 

We noted that the above hash schemes have all hash values of length 
equal to the block length of the PRP. However there are also hash schemes 
which have hash values of length twice of the block length of the PRP. The 
motivation for these hash functions are to thwart birthday attacks by ex- 
panding the sample space. Now we will give some examples of these types of 
block-cipher based hash functions. 

2 g is a function from {0, 1}' to {0, if k = I, then g can be chosen as the identity 
map. 
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6. Yi-Lam Hash 

This hash algorithm was proposed by [18]. For this function, we have k — 21 
and |/i(M)| = 21. We have the following algorithm for this hash function. 



H(M) = H t \\G t , where i £ {1, ...,£} and [+] is summation modulo 2 l . 

Despite the fact that this hash scheme was conjectured to be secure 
against all attacks faster than brute force, it was later proved to be inse- 
cure [19], [20]. 
7. MDC2 and MDC4 

MDC2 and MDC4 are two modification dedection codes proposed by IBM, 
and have rates 1/2 and 1/4 respectively, with the algorithms defined in [3]. 

Now we will examine the most widely used block-cipher based MAC al- 
gorithm, CBC MAC. 

3.3 CBC MAC 

CBC MAC is the mostly used block-cipher based MAC algorithm today. [21], 
[22], and [23] give the algorithm of CBC MAC as follows. 

H = IV, H t = E(K, Mi H^), H(M) = H t , where i e {1, t}. 

[24] proved that CBC MAC is secure for fixed length messages, say for 
messages of length ml for some m. However, default CBC MAC algorithm 
is not secure for arbitrary length messages. Let's give some examples from 
[3] and [5] to make this point clear. 

Example 3.3.1 Let M = M\...M t be a message and suppose (Mi, MAC\) is 
known. Now consider the MAC of MAC X which is E k {MAC x ) = E k (E k (Mi)). 
Note that this MAC is also the MAC of M^O 1 . Hence (MAC^M^O 1 ) is a 
valid existential forgery. 



H = IV 1 ,G = IV 2 , 




Hi = E Kt {Mi) M^ d = (E Ki (Mi) G^) [+}H t 
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Example 3.3.2 Let M = M 1 ...M t be a message and suppose (M ± , MACi) 
and (M 2 , MAC 2 ) are known. Now consider the MAC of Mi||iV which is 
Ek{MAC\ ®iV), where N is an arbitrarily chosen message block. Note that 
this MAC is also the MAC of M 2 \\MACi 0iV0MAC 2 , this again yields to 
a valid forgery. 

Example 3.3.3 Let M = M 1 ...M t be a message and suppose (M 1: MAC 1 ) 
is known. Then the MAC of M 1 \\(M 1 (B MACi) is immediately known. 

In order to deal with the above deficiency of the CBC MAC, other variants 
of this algorithm were developed and proved to be secure. Now, we will give 
these in the following part of this section, [5]. 

1. EM AC 

Suppose that we have a CBC MAC h CBC (K u M). Now we define the EMAC 
as h EMAC (Ki, K 2 , M) = E(K 2 ,h CBC (K u M)) with an additional key K 2 ^ 
K 1: [25]. Since the underlying PRP of the MAC hash only takes inputs of 
block length /, the domain of EMAC is ({0, 1}') + . [26] has showed that the 
probability of breaking EMAC, Pr(forge) < 2a 2 /2 l , where a is the total 
number of blocks of messages whose MACs are known. Clearly, the number 
of encryptions is \M\/l + 1. 

Even though we have expanded the domain of securely hashed messages 
from {0, l} ml to ({0, 1}') + , we are still not able to MAC arbitrary length 
messages. The first variant of CBC MAC dealing with this problem is the 
EMAC*, which is defined as follows. 

2. EMAC* 

Suppose that we have an EMAC, h EMAC . Now we define EMAC* as h EMAC * (K u K 2 , 
h EMAC (K ll K 2 , M||10'- 1 -l M l mo<fl ). The disadvantage of this scheme is that it 
creates an unnecessary block when |M| — tl for some integer t. The number 
of encryptions is |~(|M| + 1)//] + 1. 

3. ECBC 

To deal with the extra padding problem of EMAC*, a new algorithm was 
described, called ECBC, which does padding only when necessary. The al- 
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gorithm of ECBC, h ECBC , is defined as follows. 



if M e ({0, 1}') + then return h EMAC (K 1: K 2 , M) 

else return k^^^K^M^tf- 1 ^™*). 

In the above algorithm , K 3 is a key distinct from K 2 . Otherwise, 
there are trivial collisions. For example, lets take h ECBC (Ki, K 2 , K 3 , M) and 
h ECBC {K u K 2 , K 3 , M\ | Ky-H^lmodO. T h ey have the same hash value for any 
M £ ({0, The number of encryptions in ECBC is \\M\/l] + 1. The- 
orem 3.3.5 proves that ECBC is secure, [27], but first consider the following 
definiton. 

Definition 3.3.4 Let l,m,m' > 1, then the collision probability of the 

CBC MAC, Vi(m,m') is defined as follows. 

Vi(m, m ) = max {Pr[vr <- Perm, : h CBC (-K, M) = h CBC {n, M 

Me{0,l} im , M'e{0,l} lm ', M^M' 

Theorem 3.3.5 Fix I > 1 and let iV = 2 l . Let D be an adversary which 
asks at most q queries each of which is at most mi-bits. Assume m < N/4. 
Then 

Pr[7n,7r 2 ,7r 3 <- Perm, : D h ECBC {KuK 2 ,K 3 ,.) = 1] _ Pr[jR ^ iW({0, 1}*,/) : P R (-) = 

g 2 2 (2m 2 + l)g 2 
< y V I (m,m) + — < . 

4. FCBC 

In order to decrease the number of the encryptions in ECBC PRP from 
r|M|/f| + 1 to |~|M|/f|, another algorithm was constructed, called FCBC, 
defined as follows. 

if Me ({0, 1}') + then K <- K 2 , and P <- M 

else <- K 3 , and P <- M||10 , - 1 - |M|mo<fl 
Let P = Pi...P m , where |Pi| = |P 2 | = ... = |P m | = Z 

Co = O n 
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for % <— 1 to to — 1 do 

C^EkMQC^) 

return E K (P m ^C m - 1 ) 

Theorem 3.3.6, [27], proves the security of the above algorithm. 
Theorem 3.3.6 Fix I > 1 and let iV = 2 l . Let D be an adversary which 
asks at most q queries each of which is at most to/- bits. Assume to < N/A. 
Then 

Prim, 7T 2 , 7T 3 <- Permi : D h FCBC (KuK 2 ,K 3 ,.) = 1 ]_ Pr [ jR ^ Pan rf({0, 1}*, /) : £>*(■ 

g 2 g 2 (2m 2 + l)g 2 
< — Vj(m, to) + — - < — . 

5. XCBC 

A latest version of the CBC MAC variants is the XCBC algorithm, which 
gets rid of using multiple keys as the secret key of the underlying PRP, and 
hence uses the same key for all PRP operations. This algorithm is defined 
as follows. 

if Me ({0, 1}') + then K <- K 2 , and P <- M 
else K <- K 3 , and P <- M||10 , - 1 - |M|mo<fl 
Let P = Pi...P m , where |Px| = |P 2 | = ... = |P m | = Z 

C = 0" 
for i <— 1 to to — 1 do 

a^^(p,0c,-i) 

return P Xl (P m 0C m _ 1 0K) 

Theorem 3.3.7, [27], proves the security of the above algorithm. 
Theorem 3.3.7 Fix I > 1 and let iV = 2 l . Let D be an adversary which 
asks at most q queries each of which is at most mZ-bits. Assume to < N/A. 
Then 
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|Pr[7Ti <- Perm,; K 2 , K 3 <- {0, 1}' : D>> FCSC K^3,) = X ] 

-Pr[i? <- i?anrf({0, 1}*,/) : D R ^ = 1] 

a 2 (2m 2 + l)g 2 (4m 2 + l)g 2 
< -^(m, m) + < . 

4 Non-Block-Cipher Based Hash Functions 

In this section, we give a brief review of non-block-cipher based hash func- 
tions. The general property of these functions is that they are preimage 
resistant and do not use a PRP as a primitive element. Moreover, they may 
have mathematically complex structures. 

4.1 Some Special Examples 

Now, we will give information on some special examples of non-block-cipher 
based hash functions. 

1. Hash Functions Using Matrices 

[28] (Random Matrix Hashing Algorithm) and [29] propose hash functions 
based on matrix algebra. In [28], to get the hash value, a secret key is used 
which is in the form of a t x I matrix. In [29], the hash value h(M) of a 
message M is computed as h(M) = M T RM, where R is a randomly chosen 
t x t matrix. [29] has been shown to have some weaknesses. 

2. Hash Functions Using Number Theory 

These functions are based on the difficulty of solving problems in number the- 
ory, and most of them are constructed with a modular arithmetic operation. 
Here are some examples. 
2.1 RSA-Cipher Block Chaining 

The algorithm of this RSA based CBC function is defined as follows, [8]. 
H = IV, Hi = (H^i Mif mod N, H(M) = H t , where i G {1, t}. 
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The length of iV defines the trade off between the security and speed 
of this algorithm. In order to acquire a higher speed, this algorithm was 
modified by replacing e with 2, or applying this squared modulus only on 
either Hi-\ or Mj, [12], [30]. Similar constructions using squaring can be 
found in [8], [31], [32], [33], [34]. 

2.2 Chaum-van Heijst-Pfitzmann Hash Function 

This hash function is based on a discrete logarithm problem, which is defined 
as follows, [35]. p and q — (p — l)/2 are two big primes and a and (3 are two 
primitives of Z p . It is difficult to know the value of log a f3 and one defines the 
Chaum-van Heijst-Pfitzmann Hash Function h : {0, ...,q — l} 2 — > Z p — {0} 
as /i(mi,m 2 ) = a mi (3 m2 mod p. The contrapositive of the following theorem 
proves the security of the above function. 

Theorem 4.1.1 Given one collision for the Chaum-van Heijst-Pfitzmann 
Hash Function h, it is easy to compute log a f3. 
3. Other Special Hash Functions 

There are other hash schemes based on different systems. For example, there 
are hash functions based on claw-free permutations ([36]), knapsack problem 
([8]), and cellular automata ([37]). 

4.2 MD4 Family 

MD4 family functions were already overviewed well in the appendix part of 
the survey [1]. Alas, we do not give the details of them here again. Some of 
these functions are named MD2, MD4, MD5, HAVAL, Snefru-8, RIPEMD 
128, SHA-1, RIPEMD 160, MAA, DSA, BCA, FFT-Hash I, FFT-Hash II, 
N-Hash. The general property of these functions are they are fast on 32-bit 
machines and most of them are considered to be secure. SHA-1 is one of the 
most used standards today, [41]. 



16 



5 Constructing Hash Functions 



It has been shown that we can construct a SCFHF with an infinite domain 
from a SCFHF with a finite domain. The following two theorems summarize 
this, with details in [4]. 

Theorem 5.1 Suppose h : {Z 2 ) m -> (Z 2 )* is a SCFHF, where m > t + 2. 
Then one can construct a function 

oo 

h* : (J (Z 2 y -> (Z 2 )*, which is a SCFHF. 

Theorem 5.2 Suppose /i : {Z 2 ) t+1 -»• (Z 2 )* is a SCFHF. Then one can 
construct a function 

oo 

/i* : (J (Z 2 y -> (Z 2 )*, which is a SCFHF. 

i=t+l 

6 Attack Methods 

Throughout the paper, we gave some notions of the security of hash functions 
in many parts. Therefore, this section is indeed a complementary part to all 
of those information. In this section we will talk about the possible methods 
of attacks on hash functions. 

In general a weak hash function means that its algorithm is open to a kind of 
attack, possibly to many. For instance, the simplest possible attack is, for a 
message M, to find a correct hash value MD = h(M) with a random guess, 
which has the probability 1/2'. Here are the other possible attacks on hash 
functions. 

1. Birthday Attack 

This attack is based on the famous Birthday Paradox 3 , which is defined as 
follows. Let / : {pi,P2, •••} — > {d\, ...,^365} be a function from the set of all 
people (pis) to the days of the year (djs). f does simply give the birthday of 
the person it was applied. Then finding two people, Pi and pj, with probability 
P such that fipi) = f(Pk), requires almost yjnln(l/l — P) applications of 
3 This is indeed a mathematical fact, not a paradox. 
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this function. The same argument can be applied to any hash function to find 
collisions. However, if one chooses a set of hash values with sufficiently big 
length, this attack does not make any sense from the point of computational 
feasibility. 

2. Key Search and Pseudo Attacks 

Key search attack is used for the keyed hash functions to make a key recov- 
ery, hence to break the hash algorithm. This attack is defined as follows. 
Suppose there is one or more (M,, h(K, Mi)) pairs known to A. Then A 
selects arbitrary elements, K[s from the key space {0, 1} K and tests whether 
h(Ki, Mi) = h(K, Mi) or not. Finally, A tries to determine a suitable secret 
key. [38] makes a good analysis of the limits of this kind of key search tech- 
nique. In the pseudo attack, again a key is tried to be determined, but this 
time it is sufficient for the key to work only with the known (M { , h(K, M;)) 
pairs. This may cause to some weaknesses. These techniques are again sub- 
ject to computational feasibility. 

3. Meet-in-the-middle Attack 

Meet-in-the-middle attack is originated from the birthday attack. This at- 
tack is used in the iterated hash schemes to break preimage or 2nd-preimage 
resistance. The attack is defined as follows, [8]. First a selected message is 
divided into two parts. One starts from the initial value and goes forward, at 
the same time he starts from the hash result and comes backward. The prob- 
ability of getting a collision in the intermediate stage is the same probability 
in the birthday attack. For details, one can look at [38], [39]. 

4. Correcting Block Attack 

This attack is defined as follows. For a given hash value MD, one selects a 
message M and starts to concatenating a redundancy to this message until 
the hash value h(M) will be equal to MD. 

5. Differential Cryptanalysis 

This attack is due to [40], which is done by examining the correlation between 
the inputs and outputs of a hash function. 

6. Fixed Point Attack 
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This attack is again for the iterated hash schemes, which aims to find a 
fixed point in an intermediate stage. For instance suppose, we have a round 
function /, which satisfies at an intermediate stage % the equation f(xi, mi) = 
X{. Then we may replace rrii with an other block m! i and can get the same 
hash value as a result. 

7 Conclusion 

We intended to give an overview of almost all types of hash functions con- 
structed during the short history of hash algorithms. We included mathe- 
matical definitions, theorems and facts to make the issue more precise. We 
gave information on sufficient security levels for hash functions by introduc- 
ing possible attack methods on these functions. Hence, a general reader may 
now use this technical report as a survey of hash functions. 

8 Notes 

This paper has not been published yet, the reference to this paper can be 
currently done as 'T. Ozsari. A Hash of Hash Functions. Technical Report. 
Departments of Mathematics and Computer Engineering, Koc university, 
October 2003.' 
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